Tutorial on setting up and using colorblind package inside Jupyter Notebook

Overview

In this tutorial, I will demonstrate how to implement the Python module colorblind to check your figures for color blindness/ deficiency.

Background

According to the NIH's National Eye Institute, about one out of 12 men are color blind. Although less at risk, about 1 of every 200 women expenrience color blindness. Although everyone sees colors differently, color blind individuals have a much harder time differentiating between colors. Research shows that there are different types of color blindness, but the 2 most common are red-green weakness/ blindness and blue-yellow weakness, which will be demonstrated later in the tutorial.

It is important for data scientists and analysts to do a spotcheck of their data visualizations every now and then - our figures are only useful when people can interpret them, and someone who can't tell a red dot apart from a green dot would not find any use for a scatterplot with only red and green dots.

Motivations

Colblindor (https://www.color-blindness.com/coblis-color-blindness-simulator/) is a website where users can upload their own images and apply color filters to see how a colorblind person would see colors in a picture. The website is great for those who are curious about how the world looks like for a colorblind person. Although it is a very useful tool, as an aspiring data scientist, I have used it fewer times than I would have liked, for several reasons:

1) I find it cumbersome having to save a draft of a plot, go to the website, check the filters, and then go back to my Notebook/ script to make changes.

2) If I'm working on internal/ confidential data, I don't want to first upload it to an external website.

Thus, I wanted to find if there exists a module that allows Python users to apply colorblind filters to figures on-the-go, without breaking their workflow, and hallelujah, it exists!

Introducing colorblind

The colorblind package was released on January 11, 2021, so it is relatively new. Aside from the user guide included in the above link, I have not found another blog post or tutorial demonstrating how to use it and what it does in practice. The link to the package is the top result for "python colorblind" in Google, however, subsequent links are about Python colorblind palletes and not how to use colorblind in data visualization with Python.

I am writing this tutorial to demonstrate how to use the module to spot check your data visualizations and what you need to run it in a Jupyter Notebook.

Installation

The module can be installed with pip install colorblind You will also need OpenCV, matplotlib and numpy, so make sure to have the modules installed on your system/ virtual environment. Side note: I have been using a 2021 Macbook Pro with the brand new M1 Max chip, so I've been extra skittish about using virtual environments since some modules haven't offered native support for Apple Silicon chips - luckily this isn't the case with colorblind.

Types of colorblindness

As a quick example, I will first use the module to apply color blindness filters on a normal photo to demonstrate the different kinds of color blindness before exploring how data scientists can benefit from using this package in their data science workflow.

This is a screenshot of the 2018 World Cup match between Russia (red) and Saudi Arabia (green). If you can immediately tell the difference between the 2 teams, congratulations, you do not have red-green color blindness. If you have difficulty differentiating them, Russia is the team with white pants, and I hope this tutorial isn't where you learned that you are red-green color blind.

world_cup.jpg

Using this photo and colorblind, I can demonstrate what red-green color blind people see when they look at the picture. First, let's import our modules.

We want to load the photo into our Jupyter Notebook, but in a way that our computers can also "see" it. This is where OpenCV and numpy comes in.

Our World Cup image is a numpy array with height 641, width 1134, and 3 color channels. Every colored image has 3 color channels: Red, Green, Blue - RGB. OpenCV, for legacy reasons, uses BGR when it reads images, so the first step is to reverse this order. Luckily, the image is a numpy aray, so we can interact with it like we would with any numpy vector.

Now that the image is loaded and processed, let's use colorblind to see what the World Cup match looked like for viewers who are red-green color blind. Scientifically, red color blindess is called "protanopia" which is what we'll indicate for the package

red_sim is a numpy array just like img. This means we will need to be convert it into a picture for human vision. The matplotlib function imshow() will help us do that inside a Jupyter Notebook.

Yikes! This is how the match looked like for those with red color blindness. Now let's simulate green color blindness (deuteranopia). I will use a different photo of the same World Cup match, taken from a different angle.

world_cup2.jpg

When the players are zoomed in, the all-agreen uniform looks gray for a person with green color weakness, whereas the red-on-white uniform looks to be the same color as the turf. Not a very pleasant viewing experience, FIFA.

Another type of color blindness is blue-yellow color blindness, or tritanopia. To demonstrate this, I will use a third picture. Below are the combinations of uniform options for the Los Angeles Chargers - an American football team in the NFL whose primary colors are navy blue, powder blue, and gold. chargers2.jpeg

From the simulation above, someone with tritanopia probably think the Chargers' primary colors are black, teal and light pink. If you do have a blue-yellow color blindness, the uniform player 97 above probably looks more like this (spoilers: this is a different team - the Miami Dolphins). dolphins2.jpg

This is why the NFL and college football teams tend to stick to having the home team wear color jerseys and visiting team white - that way, viewers with the common forms of colorblindness can still enjoy the games. (Football: 1 - Soccer: 0)

What does this have to do with Data Visualization/ Data Science?

Now that we've established what color blindness is and how to use the colorblind package to simulate colorblindness in photos, let's discuss how this simple package can help data scientists create color blind-friendly figures.

Suppose you want to demonstrate a classification technique. Let's generate a toy plot with 2 classes and color them red and green.

We want to demonstrate how Support Vector Machines would split these 2 clusters in 2 separate classes by drawing a line between 2 clusters. A person without red-green color blindness can probably tell where the SVM split should be.

Now, let's simulate what the above figure appears like for someone with deuteranopia. Because colorblind is built on top of OpenCV, we still need to save the figure locally as a photo, then read it back into our notebook before we can use colorblind. Thankfully, all of this can be done within 1 code block in Jupyter Notebook.

Thanks to the deuteranopia filter, we can see that it is nearly impossible to distinguish between the red and green markers in our figure. When someone with red-green colorblindness looks at these 2 clusters, they would not be able to tell why SVM splits the dataset where it does.

Conclusion

The colorblind module allows us to do ad-hoc visual checks without having to disrupt our workflow or upload internal/ confidential/ work-in-progress data visualizations externally.

Aside from this pacakage, there are troves of resources to help practitioners create color blind-friendly data visualizations, such as color blind palletes and redundant encoding as shown below.